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Abstract —In this paper, we study the two-way relay channel 
with energy harvesting nodes. In particular, we find transmission 
policies that maximize the sum-throughput for two-way relay 
channels when the relay does not employ a data buffer. The 
relay can perform decode-and-forward, compress-and-forward, 
compute-and-forward or amplify-and-forward relaying. Further¬ 
more, we consider throughput improvement hy dynamically 
choosing relaying strategies, resulting in hybrid relaying strate¬ 
gies. We show that an iterative generalized directional water¬ 
filling algorithm solves the offline throughput maximization 
problem, with the achievable sum-rate from an individual or 
hybrid relaying scheme. In addition to the optimum offline policy, 
we obtain the optimum online policy via dynamic programming. 
We provide numerical results for each relaying scheme to support 
the analytic findings, pointing out to the advantage of adapting 
the instantaneous relaying strategy to the available harvested 
energy. 

Index Terms —Energy harvesting nodes, two-way relay chan¬ 
nel, decode/compute/compress/amplify-and-forward, hybrid re¬ 
laying strategies, throughput maximization. 

I. Introduction 

Wireless networks consisting of energy harvesting nodes 
continue to gain significance in the area of green communica¬ 
tions ii-Ea. These networks harvest energy from external 
sources in an intermittent fashion, and consequently require 
careful management of the available energy. 

There is considerable recent research on energy manage¬ 
ment for energy harvesting networks. Reference ||4| considers 
an energy harvesting transmitter with energy and data arrivals, 
and an infinite size battery to store the harvested energy, and 
shows the optimality of a piecewise constant power policy for 
minimization of completion time of a file transfer. In ||5l, the 
throughput maximization problem is solved when the energy 
storage capacity of the battery is limited. It is shown that 
the transmission power policy is again piecewise constant, 
changing only when the battery is full or depleted. Extension 
of the model in 15] to fading channels is studied in Ih] where 
a directional water-filling algorithm is shown to yield the 
optimum transmission policy. Reference HI also considers 
throughput maximization for a fading channel under the same 
assumption. The impact of degradation and imperfections 
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of energy storage on the throughput maximizing policies is 
studied in IfSl- lfTOl . The single user channel with an energy 
harvesting transmitter and an energy harvesting receiver is 
considered in o, and decoding and sampling strategies for 
energy harvesting receivers is considered in m. Various 
multi-user energy harvesting networks have also been studied 
to date; including multiple access, broadcast, and interference 
channels with energy harvesting nodes ifOll - llTsl . In addition 
to these multi-user setups, variations of the energy harvesting 
relay channel are studied in lfT9ll - ll^ . including multiple 
energy harvesting relays flEj . 

In this work, we study the simplest network setup that 
embodies a cooperative communication scenario with two- 
directional information flow, with the goal of identifying de¬ 
sign insights unique to such scenarios. This leads to the investi¬ 
gation of bi-directional communication with energy harvesting 
nodes. Specifically, we study the so-called separated two-way 
relay channef] with energy harvesting nodes. The channel 
is separated in the sense that the users cannot hear each 
other directly, i.e., communication is only possible through the 
relay. This model is relevant and of interest for peer-to-peer 
communications, or for any scenario where a parr of nodes 
exchange information, and avails the relay node to implement 
strategies to convey both messages simultaneously. The two- 
way relay channel (TWRC) with conventional (non-energy- 
harvesting) nodes is studied with various relaying strategies 
such as amplify-and-forward, decode-and-forward, compress- 
and-forward llZ7l - ll29l . and compute-and-forward ll^ in half¬ 
duplex lIZTll . Il28l and full-duplex ll29l . OTll models. It is 
observed that different relaying schemes outperform the others 
for different ranges of transmit powers. 

In this paper, we identify transmission power policies for 
the energy harvesting two-way relay channel (EH-TWRC) 
which maximize the sum-throughput. The energy harvesting 
relay can perform amplify-and-forward, decode-and-forward, 
compress-and-forward, or compute-and-forward relaying. Due 
to intermittent energy availability, the channel calls for relay¬ 
ing strategies that adapt to varying transmit powers. Eor this 
purpose, we introduce a relay that can dynamically change its 
relaying strategy, resulting in what we term hybrid relaying 
strategies. We derive the properties of the optimal offline trans¬ 
mission policy, where energy arrivals are known non-causally, 
with the goal of gaining insights into its structure. Next, we 
show that an iterative generalized directional water-filling al- 

* Usually refeired to as the two-way relay channel, as we will in the sequel. 
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Fig. 1. The two-way relay channel with energy harvesting nodes (EH-TWRC). 


gorithm solves the sum-throughput maximization problem for 
all relaying strategies. We next find the optimal online trans¬ 
mission policy by formulating and solving a dynamic program, 
where the energy states of the nodes are known causally. We 
compute optimal policies for different relaying strategies and 
provide numerical comparisons of their sum-throughputs. Our 
contribution includes generalization of directional water-filling 
© to an interactive communication scenario with multiple 
energy harvesting terminals in the offline setting, as well as 
the identification of optimal policies in the online setting. The 
interactive communication scenario considered in this paper 
is the catalyst that can drastically change the resulting power 
allocation algorithms in the energy harvesting setting. The 
two-way relay channel is the simplest multi terminal network 
model that demonstrates this interaction, and hence is the 
model considered. We observe that the relaying strategy has 
a significant impact on the optimum transmission policy, i.e., 
transmit powers and phase durations, and that hybrid relaying 
can provide a notable throughput improvement for the EH- 
TWRC. 

The remainder of the paper is organized as follows. The 
system model is described in Section In] In Section |III] a 
hybrid relaying scheme where the relay can alter its strat¬ 
egy depending on the instantaneous powers is introduced. 
In Section |IV] the sum-throughput maximization problem is 
presented for an EH-TWRC, and is divided into subproblems 
that can be solved separately. In Section [V] the iterative 
generalized directional water-filling algorithm is proposed to 
And an optimal policy for the EH-TWRC. The online policy 
based on dynamic programming is provided in Section |VT] 
Numerical results are presented in Section IVIII The paper is 
concluded in Section IVIIII 

H. System Model 

We consider an additive white Gaussian noise (AWGN) two- 
way relay channel with two source nodes, Ti and T 2 , that 
convey independent messages to each other through a relay 
node T3. The two source nodes cannot hear each other directly, 
hence all messages are sent through the relay. The channels 
to and from a source node are reciprocajl, with power gains 
hi 3 between nodes Ti and T 3 and /123 between nodes T 2 and 
T3. We consider the delay limited scenario, where the relay 
forwards messages as soon as they are received, and thus has 
no data buffer. The channel model is shown in Eigure [T] 

^While this assumption is for the sake of simplicity, we note that the results 
of this paper directly extend to models without reciprocity. 
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Fig. 2. The energy harvesting model for node Tj, j = 1, 2,3. 


All nodes Ti, T 2 and T 3 are powered by energy harvesting. 
Node Tj, j = 1,2,3, harvests Ej^n > 0 units of energjf] at 
time s„, and stores it in a battery of energy storage capacity 
Ej^max- Any energy in excess of the storage capacity of 
the battery is lost. The initial charge of the batteries are 
represented with Ej^i, with si = 0 by definition. The time 
between the nth and (n + l)th energy arrivals, referred to as 
the nth epoch in the sequel, is denoted by In = Sn-rr — Sn- 
We remark that the model does not require all nodes to 
harvest energy packets simultaneously; but rather indicates that 
epochs are constructed as the intervals between any two energy 
arrivals. A node that is not receiving any energy at the nth 
harvest is set to have Ej^n = 0. The energy harvesting model 
is depicted in Figure |2l 

We consider a transmission session of TV epochs, with 
length SAT+i, for which the energy harvesting profile consists 
of Ej^n and s„ for j = 1, 2,3 and n = 1,..., W. In epoch n, 
n = 1,... ,N, node Tj, j = 1,2,3, allocates an average power 
Pj^n for transmission, i.e., a total energy of lnPj,n is consumed 
for transmitting. Since the energy available to each node is 
limited by the energy harvested and stored in the respective 
battery, the energy harvesting profile determines the feasibility 
of pj n for each node. Specifically, the transmission powers 
satisfy 

J = 1,2,3, n=l,...,N, (1) 

where „ is the energy available to node j at the beginning 
of epoch n, which evolves as 

Bj,n+i = Tam{Ej ^max: Ej qn lnPj,n }. ( 2 ) 

In this work, similar to references a-ii, m, un-ini, 
m-Ea, the energy harvesting profile is known non-causally 
by all nodes, so that offline optimal policies and performance 
limits of the network can be founc0. The communication 
overhead for conveying energy arrival information and power 
allocation decisions is considered to be negligible compared 
to the amount of data transferred in each epoch. 

We consider the problem of finding the power policy which 

^Recent efforts extend the discrete energy arrivals to continuous ones for 
the single user c hann el and concludes similar insights albeit with a more 
involved analysis OH 

^We provide the online policy with causal energy arrival information in 
Section 
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Fig. 3. Comparison of sum-rates for a symmetric full-duplex channel with 
h\ 2 , = h2Z = 1 at p 3 = 2. Amplify-and-forwai‘d rates remain just below 
compress-and-forward and thus are not visible. 


maximizes the sum-throughput of the system under different 
relaying strategies such as decode-and-forward, compute-and- 
forward, compress-and-forward and amplify-and-forward. In 
the next subsection, we present the rate regions for these 
relaying strategies. 

A. Rate Regions with Average Power Constraints 

We focus on a two-phase communication scheme, consisting 
of a multiple access phase from nodes Ti and T 2 to T 3 and 
a broadcast phase from T 3 to Ti and T 2 . This is referred to 
as multiple access broadcast (MABC) in ll27l . Il28]l . Its three 
phase counterpart, time division broadcast (TDBC) Il27l . Il28ll , 
can be shown to perform no better than MABC in the absence 
of a direct channel between Ti and T 2 , and is therefore omit¬ 
ted. For half-duplex nodes, the rates achievable with decode- 
and-forward, compress-and-forward, amplify-and-forward and 
compute-and-forward relaying schemes are derived in El, 
1281 . These works consider nodes that are constrained by 
their instantaneous transmit powers, and do not consider 
total consumed energy, which depends on the duration of 
multiple access and broadcast phases. Since our model is 
energy-constrained, we revise the results of these work by 
scaling transmit powers with phase duration, thereby replacing 
instantaneous transmit powers with average transmit powers 
Pj^n- We denote the set of rate pairs achievable with average 
transmit powers pi, p 2 and p^ and multiple access phase 
duration A as TZhd{pi,P 2 ,P 3 , in the half-duplex case. 
The duration of the broadcast phase is A = 1 — A. For 
full-duplex nodes, due to simultaneous multiple access and 
broadcast phases, there is no need for the time sharing factor 
A; we use Ti,FD{piiP 2 iP 3 ) to denote the achievable set of rate 
pairs. In this case, the full-duplex nodes can remove the self¬ 
interference term form the received signal, as in l29ll . We use 
a subscript to denote the relaying strategy where necessary. 

Decode-and-Forward: In this scheme, the relay decodes 
the messages of both source nodes in the multiple access 
phase, and transmits a function of the two messages in 


the broadcast phase. Nodes Ti and T 2 use the broadcast 
message along with their own messages to find the ones 
intended for them. For half-duplex nodes, the rate r^ion 
T^DF-HD{Pi,n,P2,n,P3,n, A„) in epoch 71 is defined b>0 


Ri.n < min|A„C 
< min jA^C 
Rl,n + f?2,n A An,C 


hl3Pl,r 

A„ 

h23P2,r 


,A„C 


,AnC 


h23P3,r 

An 

hl3P3j 

An 


hl3Pl^n + h23P2,r 


, (3a) 
, (3b) 
(3c) 


where C{p) = ilog(I + p). With full-duplex radios, the 
two phases take place simultaneously, achieving instantaneous 
rates i? 2 .n) e 'RDF-FD{pi.n,P 2 ,n,P 3 .n) which are 

found by substituting A„ = A„ = 1 in @. 

Compress-and-Forward: In this scheme, the relay trans¬ 
mits a compressed version of its received signal in the 
broadcast phase. The instantaneous rates {Ri,n,R 2 ,n) S 
'R-CF-HD{,Pl.n,P2,n,P3.n, An), 0 < A„ < 1, for the MABC 
half-duplex case satisfy 


Rl,n < AnC 


R2,n A AnC 


(cr^^^)2/ll3Pl,n/A„ 


((7^^^)^fi23F2,n/A„ 


Pi 
^ y 


(1)/d(1) 




(1) 


(4a) 


(4b) 


for some > 0 and ay'^’ > 0 , where Py^’ = hispi^n/An + 
h23P2,n/An "f 1. The full-duplex rates (Pi,„,P 2 ,rt) S 
'R-CF-FD{pi,n,P2,n,P3,n) are 


.( 1 ) 


7(1) 


Pl.n < C (y^) ’ < C 

where cr^ = max{(T^^, ( 7 ^ 2 }’ 


( h23P2,n \ 

V1 + ’ 


(7 


2 

cl 


1 + h23P2,n 2 _ 1 + hl3Pl,n 

22fi3,n ’ ^c 2 — 22-R3.r. ’ 


P3,„ = min{C'(/li3P3,„),C'(/l23F3.n)}- 


(5) 

(6a) 

(6b) 


Amplify-and-Forward: In this scheme, the relay broad¬ 
casts a scaled version of its received signal. Since this is 
performed on a symbol-by-symbol basis, the time allocated 
for multiple access and broadcast phases are equal. The rate 
regions 'R-AF{pi,n,P 2 ,n,P 3 ,n) are found as 


Rl,n < AnC 

R2,n < AnC 


hl3h23Pl,nP3,r 


A„(/li 3 Pi,„ -f h 23 {P 2 ,n + P 3 ,n) + A„) J ’ 

(7a) 

h\3h23P2,nP3,n 


An{h 23 P 2 ,n + ^ 13 (Fl,ra + P 3 ,n) + A„) J 

(7b) 


by substituting A„ = 0.5 for the half-duplex case and A„ = 1 
for the full-duplex case. 


^The power gains, transmit powers and harvested energy values are nor- 
malized in order to obtain an effective noise variance of 1 at each node. This 
is done by first scaling hi^ and /i 23 to establish unit variance noise at nodes 
Ti and T 2 , and subsequently scaling the transmit power, available energy and 
battery capacity at nodes T\ and T 2 to yield a unit variance noise at T 3 . 
























4 



Fig. 4. Compaiison of sum-rates for a symmetric half-duplex channel with 
h\ 2 , = h2Z = 1 at p 3 = 2. Amplify-and-forwai‘d rates remain just below 
compress-and-forward and thus are barely visible. 


Compute-and-Forward (Lattice Forwarding): In this 
scheme, nested lattice codes are used at the source nodes, 
and the relay decodes and broadcasts a function of the two 
messages received from the sources. Each source then cal¬ 
culates the intended message using the side information of 
its own |(33|. The rate region TZLF-HD{pi,n,P 2 .n,P 3 ,n, ^n) 
achievable with this scheme for an MABC half-duplex relay 
consists of rates satisfying 

, + f Pl^n hi^Pi n\ 

— log^ - -f ■ , 

2 \Pl,n P2,n / 

, + f P2,n , h23P2,n\ 

— log^ - -j- --f . , 

2 \Pl,n P2,n J 

(8b) 

where A„ = 1 — A„ and log^(a:) = max{loga:, 0}. The full- 
duplex rate region 'R-LF-FD{pi,n,P 2 ,n,P 3 ,n) can be evaluated 
by substituting A„ = A„ = 1 in In reference ll^ . it 
is shown that this strategy achieves within i bits of TWRC 
capacity in each epoch. 

It can be observed that the compute-and-forward rates are 
not jointly concave in transmit powers pj^n, j = 1, 2, 3. This 
implies that time sharing between two sets of transmit powers 
{pi,n,P2,n,P3,n) and P 2 ,n,P3,n) with parameter A, con¬ 

suming average powers pj^n = ^Pj,n + (1 — ^)Pj,n, j = 1, 2,3, 
can yield rates i?2,n) i T^LF-FD{Pl,n,P 2 ,n,P 3 ,n)- To 

include rates achievable as such, we concavify the rate region 
by extending TZLF-FD{pi,n,P 2 ,n,P 3 ,n) to include all time¬ 
sharing combinations with average power {pi^n,P 2 ,n,P 3 ,n), 
i.e.. 


R2.n < min 


i?i „ < min 


'R^F-FDiP^,njP2,mP3,n) — 


R2,rt 


Rk^n — y ^ ^iRk,n,i ^ 


{Rl ,n ,i ^ R 2 ,n ^ RLF—FniPljU^i: P2^n,i: P3^n,i'): 

y ^ Aj — 1, y ^ ^iPj,n,i A Pj,nj A^ ^ 0, 

i i 



j = 1,2,3, k 

= 1,2|, 

(9) 

which 

we refer to 

as the concavified rate 

region. 

This 

extends to 

the half-duplex relaying 

region 


'RLF-HD{pi,n,P 2 ,n,P 3 ,n) by time sharing among A„ 
as well. With a slight abuse of notation, we will denote 
the concavified regions with TZLF-FD{pi,n,P 2 ,n,P 3 ,n) and 
(pi,n,P2.n,P3.n, A„) in the sequel. We note that all 
rates in the concavified region are achievable via time-sharing 
within an epoch, while the average powers within said epoch, 
and hence energy constraints, hold by definition. A formal 
proof of this concavification follows IfT^ Lem. 1] closely. In 
the sequel, we use the concavified region, though we do not 
reiterate the required time-sharing for clarity of exposition. 

Since we are interested in maximizing sum-throughput, we 
compare the maximum achievable sum-rates for full- and 
half-duplex nodes employing the relaying strategies above in 
Figures [3] and m respectively. In these evaluations, a symmetric 
channel model normalized to yield /113 = (123 = 1. and 
a fixed relay power of ps = 2 is considered. It can be 
observed that different schemes may outperform based on 
the instantaneous transmit power, and thus the selection of 
the correct relaying scheme is of importance in an energy 
harvesting setting where transmit powers are likely to change 
throughout the transmission. 


III. Hybrid Schemes 


In Section lTl-AI it is observed that depending on the transmit 
powers, either one of the relaying strategies may yield the 
best instantaneous sum-rate. Due to the intrinsic variability of 
harvested energy, transmit powers may change significantly 
throughout the transmission period based on the energy avail¬ 
ability of nodes. Consequently, a dynamic relay that chooses 
its relaying strategy based on instantaneous transmit powers 
of the nodes can potentially improve system throughput. 

Another benefit of switching between relaying strategies is 
achieving time-sharing rates across strategies, e.g., switching 
between decode-and-forward and compute-and-forward strate¬ 
gies within an epoch, which can outperform both individual 
strategies with the same average power. An example of the 
benefits of time-sharing in a two-way relay channel is ref¬ 
erence ll^ . where time-sharing between different operation 
modes is considered. In ll^ . a fixed relaying strategy is em¬ 
ployed with different nodes transmitting at a time; while here 
we allow time-sharing between different relaying strategies. 

The rates achievable with a hybrid strategy switching be¬ 
tween the four relaying schemes in Figures |3] and |4] consist of 
the convex hull of the union of rate pairs achievable by the 
individual schemes. The rate region for the hybrid scheme is 





















5 



Fig. 5. Chosen relaying strategy for a symmetric half-duplex channel with 
hi 3 = /i 23 = 1 at p 3 = 2. The labels “over D&F” and “over LF” denote 
which of the two strategies is better by itself in that region. 


expressed as 

'^HYb{P1iP2iP3) = •! (^ 1 ,^ 2 ) 




^ ^ ^ ^ ^iPj^i ^ Pj-! ^ 0; 

i i 

R2,i) G TZdf U TZlf U TZcf 
unAFipi,i,P2,i,P3,i), j = 1,2,3, fc = l,2L (10) 


where TZdf, TZlf, TZcf and TZaf are the rate regions 
given in Section III-AI with decode-and-forward, compute- 
and-forward, compress-and-forward and amplify-and-forward, 
respectively. 

For the purpose of demonstration, we present the chosen 
relaying scheme that maximizes the instantaneous sum-rate for 
a half-duplex channel with fixed relay transmit power, p^ = 2, 
in Figure |5] It can be observed that while decode-and-forward 
or compute-and-forward alone are chosen at the extremes, a 
time-sharing of the two strategies is favored in between. In 
this figure, the regions where the hybrid scheme uses time¬ 
sharing are shown in two shades of blue. We note that for these 
channel parameters, the remaining relaying schemes under¬ 
perform these two for any choice of transmit powers. 

With these observations, we conclude that policies with 
hybrid relaying strategies can instantaneously surpass the 
sum-rates resulting from individual relaying schemes for a 
considerable set of power vectors. Furthermore, time-sharing 
between relaying strategies may strictly outperform the best 
relaying strategy alone. Numerical results on the performance 
of optimal hybrid schemes in comparison with individual 
schemes are presented in Section Ivnl 


IV. Problem Definition and Properties oe the 
Optimal Solution 

We consider the problem of sum-throughput maximization 
for a session of N epochs. Since achievable rates are either 


jointly concave in transmit powers or can be concavified by 
the use of time sharing as in it follows that the optimal 
transmit powers remain constant within each epoch, as noted in 
ID Lemma 2]. The power policy of the network consists of the 
power vectors (pi,P 2 ,P 3 ), where p^ = {pjA,Pj, 2 , ■ ■ ■,Po,n), 
j = 1, 2, 3, and in the case of half-duplex relaying, the time 
sharing parameters A„, n = 1,... ,N. For the set of feasible 
power policies, we first present the following proposition, 
which is the multi-user extension of E Lemma 2]: 

Proposition T: There exists optimal average transmit powers 
(p*, P 2 , P 3 ) that do not yield a battery overflow at any of the 
nodes throughout the communication session. 

Proof: Let (pi,P 2 ,P 3 ) be a vector of transmit powers yield¬ 
ing battery overflows, i.e., 

n n—1 

EjA - kpjA - Ej^rnax = E°^J >0 ( 11 ) 

for some j and n. For each battery overflow of amount at 

node Tj at the end of epoch n, let pj „ = pj^n H—For the 
remaining powers, let = Pj^n- The power policy defined 
by (Pi,P 2 ,Ps) does not overflow the battery at any time, and 
satisfies pj^n > Pj,n for all j and n. Note that nodes consuming 
powers Pj can achieve any rate pair that is achievable with less 
power, i.e., 

Pj < PjJ = 1,2,3 

^ TZfd{pi,P2,P3) C TZfd{pi,P2,P3), (12a) 

^TZhd{pi,P2,P3,^) C 7em>(pi,p2,P3, A), (12b) 

for full-duplex and half-duplex nodes with 0 < A < 1, 
respectively. Therefore, the sum-rate obtained by (pi, P 2 , P 3 ) 
at any epoch n is no less than that of (pi,P 2 ,P 3 )- Hence, 
for any policy with battery overflows, we can And a policy 
performing at least as good without overflows. ■ 

We remark that even though (fT2li does not hold immediately, 
e.g., for the amplify-and-forward rates in (|7]i, it holds by 
definition for the concavified rates in (|9]l. By choosing Ai = 1 
and Pj^n,i < Pj,n in ®, a portion of the allocated power 
Pj „ can equivalently be discarded at the node. Consequently, 
Proposition [T] applies to all concavified relaying schemes 
presented in Section Hl-AI 

As a consequence of Proposition [1] we will restrict the 
feasible set of policies to those that do not overflow the 
battery without loss of generality. In epoch n, the nodes choose 
transmit powers (pi,n,P 2 ,ra,P 3 ,n), a time sharing parameter 
A„, and a rate pair i?2,n) € TZHD{pi,n,P 2 ,n,P 3 ,n, ^n) 

in the case of half-duplex radios. The objective is to maximize 
the sum-throughput of the TWRC within N epochs, where the 
transmit powers are constrained by harvested energy and the 
rates are constrained by the rate region. We express the EH- 
TWRC sum-throughput maximization problem 

N 

max YkiRi,i + R2,t) (13a) 

Ri,R2,pi,p2,p3 A 

S.t. (-Rl,n5 ^2,n) ^ '^HD (Pl,n; P2,n^ PS^ni ^n')i (l^b) 
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iiPo,i - y] Ej^i < 0, (13c) 

n n —1 

^iPj,i E ,max 7 (13d) 

i=l i=l 

0 < A„ < 1, j = 1, 2, 3, n = 1, 2,..., Af, (13e) 

for half-duplex nodes, where pj = {pj^i,pj^ 2 , ■ ■ ■ ,Pj,N), 
j = 1,2,3, Rfc = {Rk,i, Rk, 2 , ■ ■ ■, Rk,N), k = 1,2, and 
A = (Ai, A 2 ,..., A^v)- Here, (I13db is due to Proposition [T] 
and (fT3a is equivalent to O given (I13db . While the rates 
are a function of the powers of the nodes and the time 
sharing parameters A„, n = 1,2,..., N, this dependency is 
now deferred to (I13bb . which is the constraint that ensures 
the rates are selected from the achievable region dictated by 
the power and time sharing parameters. The energy causality 
constraints given in ( I13cb ensure that the energy consumed 
by a node is not greater than the energy harvested up to 
that epoch. The no-overflow constraints given in ( I13db ensure 
that the battery capacity is not exceeded. Any power policy 
(pi,P 2 ,P 3 ) satisfying both (I13cb and (I13db for all j and n 
is considered a feasible power power policy. The problem 
for full-duplex nodes is attained by replacing ( I13bb with 
{Ri,n,R 2 ,n) e 'JZFD{pi,n,P 2 ,n,P 3 ,n) and Omitting the time¬ 
sharing variables A„, n = 1,..., N. 


We next show that (fT3l l can be decomposed by separating 
the maximization overpi P 2 ,n, and P 3 ,„, n = 1,... ,N, and 
the maximization over Ri,n, A„, n = 1,..., A, as 


N 


max max > h{Rii + R2i) 

P 1 .P 2 .P 3 Ri,R2,A ' 

i —1 


(14a) 

S.t. (-^ 1 , 71 : ^2,n) ^ P 2 ,m P 3 ,rn 

An), 

(14b) 

hPj,i — Ej^i < 0, 


(14c) 

i^l i^l 



72 72 — 1 

^ ^ ^ ^ ^iPj,i E ^j,maxt 


(14d) 




OEA, <1, j = 1,2,3, n=l,2,. 

..,A. 

(14e) 


Note that only the constraints in (I14bb pertain to the parameters 
of the second maximization. Next, we observe that the con¬ 
straints in (I14bb are separable in n, and the objective is a linear 
function of i?i „ and i? 2 .n- Hence, the second maximization 
can be carried out separately for each n, i.e., in an epoch-by- 
epoch fashion, yielding the separated problem 


N 


max kRsiPi,i,P2,i,P3,i) 

Pl,P2,P3 

1 — 1 

(15a) 

72 72 

S.t. ^ ^ ^iPj,i ^ ^ 

(15b) 



72 72 — 1 

^ ^ ^j,i ^ ^ ^iPj,i E ^j^Tnaxt 

(15c) 

2=1 2=1 


j = l,2,3, n = l,2,...,iV, 

(15d) 


where Rsipi,i,P 2 ,i,P 3 ,i) is the solution to 

max Ri i + R 2 i (16a) 

Rl,i ■,R2,i ' ’ 

S.t. {Rl,i,R2,i) €TZHD{pi,i,P2,i,P3.i,^i), (16b) 

0 < Ai < 1, (16c) 

within a single epoch i with fixed powers {pi,i,P 2 ,i,P 3 ,i)- 
This implies that the optimal transmit rates within each epoch 
are the sum-rate maximizing rates for the given transmit 
powers within that epoch. Thus, we refer to the function 
RsiPi,i,P 2 ,i,P 3 ,i) as the maximum epoch sum-rate. For full- 
duplex nodes, the maximum epoch sum-rate is found by 
solving 

max i?i i-f i ?2 z (17a) 

s.t. {Rl,i, R 2 ,i) & 'R-FDiPl,i,P2,i,P3,i) (17b) 

instead, and the power policy optimization is identical to ( fTSl l. 
We next show a property of policies that solve the problem in 

(HI. 

Lemma 1: There exists an optimal policy which depletes 
the batteries of all nodes at the end of transmission. 

Proof; Let (Ri, R 2 , pi, P 2 , P 3 ) be a transmission policy 
which leaves energy Sj in the battery of node j at 
the end of transmission. Consider the transmission policy 
(Ri,R 2 ,Pi,P 2 ,P 3 ) which has pj^N = Pj,N + £j/In, and 
equals the original policy elsewhere. Hence, this policy ex¬ 
pends the remaining energy in the battery of Tj in the last 
epoch, depleting the batteries. We have Rk,n = Rk,n for 
n = 1, 2,..., A — 1 and Rk,N A Rk,N, for fc = 1, 2, due to 
(Ell. Therefore, the sum-throughput of the new policy cannot 
be lower than that of the original policy. ■ 

V. Identifying the Optimal Policy 
Now that we have formulated the problem and identified 
some necessary properties of the optimal policy, we next find 
the optimal power policy for the EH-TWRC. We establish this 
using a generalization of the directional water-filling algorithm 
in 0, which gives the optimal policy for a single transmitter 
fading channel. In this section, we show the optimality of the 
generalized directional water-filling algorithm and verify its 
convergence. 

A. Solution of the EH-TWRC Sum-Throughput Maximization 
Problem 

To find the optimal policy, we first find the maximum 
epoch sum-rate by solving (fThl l and (Ell for half-duplex 
and full-duplex nodes, respectively. The following property 
of Rsipi,i,P 2 ,i,P 3 ,i) can be immediately observed for any 
relaying scheme. 

Lemma 2: The maximum epoch sum-rate Rs{pi,i,P 2 ,i,P 3 ,i) 
is jointly concave in transmit powers pi i, p 2 ,i, and p^^i. 

Proof: Proof follows from the concavity of objectives (I16al 
and (fTTil l. and the convexity of constraint sets (I16bl l and 
(EB- Let (i?i,i? 2 ) and (Ri,i? 2 ) denote two feasible rate 
pairs, and A and A their time-sharing parameters for transmit 
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powers {pi,P 2 ,P 3 ) and {pi,P 2 ,P 3 ), respectively. Let Rk = 
aRk + (1 - a)Rk, k =_ 1,2, pj = apj + (1 - a)pj, 
and A = aA + (1 — a)A, j = 1,2,3 denote the convex 
combination of the policies with parameter 0 < a < 1. 
Then, for all relaying schemes, {Ri,R 2 ) S TIfd{pi,P 2 ,P 3 ) 
or {Ri,R 2 ) G TZhd{pi,P 2 ,P 3 , follows either from the 
definition of the rate region, or from (|9l). ■ 

As a consequence of Lemma |2] ( fT5l l is a convex pro¬ 
gram. We next provide the iterative generalized directional 
water-filling algorithm to compute the optimal power pol¬ 
icy. Consider the power allocation problem in (fTsT i for an 
arbitrary relaying scheme with the maximum epoch sum-rate 
Rs{pi,P 2 ,P 3 )- Here, the constraints in (I15bl) and (I15cl) are sep¬ 
arable among j = 1,2,3. Hence, a block coordinate descent 
algorithm, i.e., alternating maximization, can be employed 
lf34l . In each iteration, the power allocation problem for node 
Tj, j = 1,2,3, given by 

N 


max lnRs{Pl,n,P2,n,P^,n) 

Pi>0 T 

n—1 

(18a) 

n n 

S.t. ^ ^ ^iPj^i ^ ^ 

(18b) 

i—1 2—1 


n n — 1 

^ ^ Rj,i ^ ^ ^iPj,i ^ Rj.,max: 

(18c) 



s 

II 

to 

(18d) 


is solved while keeping the remaining power levels pfc, k ^ j, 
constant. This is a convex single user problem, and the solution 
satisfies the KKT stationarity conditions and complementary 
slackness conditions lf34l 


dRs{pimP2,mP3,n) , i n /in i 

In - -F -tn 2 ^(Aj - A) +7n = 0, (19a) 

C n n \ 

) = 0, (19b) 

/ n n—1 \ 

^iPj,i Pjj^max j —O 5 ^nPj,n — O 5 (19c) 


,,i=l i=l 


for all n = 1 ,..., iV where A„ > 0, /?„, > 0 and 7 „ > 0 are 
the Lagrange multipliers for energy causality, battery capacity 
and transmit power non-negativity constraints, respectively. 
Hence, the optimal transmit power policy for Tj, i.e., p^, is 
the solution to 


dRs (Pl.ni P2,n 1 P3,n} 

^Pj,n 




( 20 ) 


for all n = 1, ...,N which follows from ( I19al i. Note that due 
to ( I19bb and ( I19cl i. the Lagrange multipliers are nonzero only 
when the respective constraints are met with equality. 

We argue that the solution to (|20] | can be interpreted as 
a generalization of the directional water-filling algorithm El 
similar to the case in am. In El, optimal transmit powers are 
found by treating the available energy in each epoch as water, 
and letting water levels equalize by flowing in the forward 
direction only. The associated algorithm is termed directional 




Fig. 6. Depiction of generalized directional water-filling for Tj with N = S 
epochs. Note that the batteiy of the node is full at the end of the 5th epoch, 
preventing further energy flow into the 6th epoch. 


water-filling. Here, we instead define the generalized water 
levels for Tj as 


^j,n{Pj,n) — 


f dRs{pi^n,P2,n,P3,n) 




dpj,r 


( 21 ) 


The following properties of i/j „ are readily observed for the 
optimal policy; (a) while pj^n > 0, the water levels remain 
constant among epochs unless the battery is empty or full, 
increasing only when the battery is empty, and decreasing only 
when the battery is full, and (b) if a positive solution to 


dRs {Pl,n ; P2,n: P3,n) 

dpj^n 


N 




( 22 ) 


does not exist, then = 0 and 7 „ > 0. These properties 
imply that the optimal policy can be found by performing 
directional water-filling using the generalized water levels 
in (ED, and calculating the corresponding transmit powers 
Pj,n- Water flow is only in the forward direction and the 
corresponding energy flow is bounded by Ej^max for node 
Tj. Hence, the flow between two neighboring epochs stops 
when water levels in (|2D are equalized or when the total 
energy flow reaches Ej^max- The initial water levels are found 
by substituting the initial transmit powers Pjn = Ej^n/ln in 
(I 2 D . This algorithm yields transmit powers that satisfy the two 
properties above by construction. An example of generalized 
directional water-filling is depicted in Figure |6] 

The iterative generalized directional water-filling (IGDWF) 
algorithm employs generalized directional water-filling se¬ 
quentially for each user until all power levels pj, j = 1,2,3, 
converge, i.e., alternating maximization. Although optimiza¬ 
tion is carried on separately for a single user at each iteration, 
the transmit powers of all users interact through the gener¬ 
alized water levels in (|2D . Starting from the initial values 
pf'n ~ Ai.n/A, the (i;th iteration of the algorithm, optimizing 
Pj^'^ for j = {k mod 3) -f 1, is given in Algorithm [T] 

Remark 1: At each iteration of the IGDWF algorithm, the 
water flow out of each of the N epochs can be found using a 























































Algorithm 1 Iteration k of Iterative Generalized Directional 
Water-filling 

1) Let j = (k mod 3) -f 1, = pf^, pf^ = pf ~^^ for 

^ j, 5n = Ej^n, n = 1, N. 

2) for n = 2,N, do 

Find the set f = {E'a > 0\iyj^n-i{Pj^l_i - 7 ^ 37 ) 

= < Ej „^ax^7 

it £ = % and Vj,n-i{.pf)i_i) > then 

assign £ = {Ej^rnax - Sn}, 

Find E^€£ and assign 

p^’:l=p^':l+^,s^=s^+E^ 

such that is minimized. 

end for 

3) Repeat 2 until J < Vj,niPj^l) or = Ej^rnax 

for all n. 


binary search. This requires updating at most N water levels 
following each epoch. Hence, the computational complexity 
of each iteration is 0{N‘^), i.e., quadratic in the number of 
epochs. 

B. Convergence of the IGDWF Algorithm 

For the alternating maximization in Section lV-Al to converge 
to an optimal policy, it is sufficient that the feasible set is the 
intersection of convex constraints that are separable among 
j = 1,2,3, and the continuously differentiable objective yields 
a unique maximum in each iteration ll34l Prop. 2.7.1]. In 
this case, the objective in (I15ab is concave and continuously 
differentiable for all relaying strategies, with compute-and- 
forward satisfying this condition after the concavification in 
The feasible set (I15bl i- (ll5cl i is separable among j = 1,2,3 
as well. However, the objective does not necessarily yield a 
unique maximum at each iteration since it is not strictly con¬ 
cave in transmit powers. To overcome this, we introduce the 
unconstrained variables Sj = (sj,i,..., Sj,Ar) for j = 1, 2, 3, 
and modify the objective in (II Sal) as 

N 

/(Pl, P 2 , P3, Si, S 2 , S 3 ) = ^ lnRs{Pl,n,P2,n,P3,n) 

n—1 

- eillPi - sif - e2||p2 - 82 ^ - csIIps -Saf, (23) 

where Cj > 0, j = 1,2,3 are arbitrarily small parameters. 
The objective in ( |2^ is maximized by a unique p^ in each 
iteration with j = 1,2, or 3. The iterations optimizing Sj 
trivially yield the unique solution Sj = pj. Therefore, the 
problem now satisfies the convergence property for alternating 
maximization, and converges to the global maximum of (fTsT i 
(Ml Ex. 2.7.2]. 

Note that through ( |2^ and the arbitrarily small Cj, we 
essentially introduce resistance to the iterative algorithm. That 
is, if the original objective in (I15al i yields multiple solutions 
for some j, the objective in (| 2 ^ has a unique solution that 
is closest to the previous value of pj. Consequently, if there 
exists more than one optimal solution to (fTSi ) at one of the 
iterations for some j, the power policy pj that is closest to 


the previous one is chosen. This is ensured by choosing the 
flow amount E’a which minimizes ||Pj^^ in Step 2 

of Algorithm [T] 


VI. Online Power Policy with Dynamic 
Programming 


The power allocation policy we have considered so far is an 
offline policy, in the sense that the energy harvest amounts and 
times are known to all nodes in advance. Although the offline 
approach is useful for predictable energy harvesting scenarios 
m and as a benchmark, it is also meaningful to develop 
policies that only rely on past and current energy states, 
i.e., causal information only. We refer to such transmission 
policies as online policies. Recent efforts that consider online 
algorithms for energy harvesting nodes in various channel 
models include El, Q, Qa, ca, Ea-iMl. Bunding upon 
the previous work, in this section, we find the optimal online 
policy for power allocation in the two-way relay channel. 

The epoch length indicates that no energy will be 
harvested for a duration of In after the nth energy arrival. 
Therefore, in the online problem, the epoch lengths are not 
known by the nodes causally. Instead, we divide the transmis¬ 
sion period into time slots of length r, and recalculate transmit 
powers at the beginning of each time slot. We assume that 
each energy harvest takes place at the beginning of some time 
slot. Note that with smaller r, this model gets arbitrarily close 
to the general model in Section HI] We assume that harvests 
Ej^n in time slot n are independent and identically distributed. 
In time slot n, nodes Ti, T 2 and T 3 have access to previous 
energy harvests Ej^i, j = 1,2,3, and i = 1, The nodes 
decide on transmit powers pj^n, j = 1 , 2 ,3, through actions 

Pj,n = 4>j,n{{Ek,i;k = 1,2,3, i = l,...,n}), (24) 

where {Ek^i;k = l,2,3,i = l,...,n} denotes all energy 
arrivals prior to, and including, time slot n. Each time slot with 
transmit powers {pj^n} contribute to the additive objective 
through the sum-rate function Rs{pi,n,P2,n,P3,n) in (fThl) and 
(fTTl) for full-duplex and half-duplex modes, respectively. We 
consider the problem of finding the optimal set of actions for 
this setting, which can be formulated as the following dynamic 
program ||39l: 




= max Rs{(j)j,n{{Ek,z})) 

4>l,n ,4>2,n ,4>3,n 


N 


+ E 




(25) 


Here N is the number of time slots and E[.] denotes the condi¬ 
tional expectation over remaining energy harvests {Ej^i}^n+i 
given the previous harvests as 

Note that the dynamic program outlined by (125]) is com¬ 
putationally difficult due to the dimension of the problem. 
However, for the case of i.i.d. energy harvests that we con¬ 
sider, it can be simplified by restricting to actions that only 
utilize current battery state. This is due to the expectation in 
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(l25l l being independent of past energy harvests. This implies 
simplifying the actions in (l24l i as 


A / Tf^bat rpbat rpbat\ 

(26a) 

n — 1 

(26b) 


where is the battery state of Tj at the beginning of slot 
n. The solution to (l25l l and (l26T l provides the optimal online 
power policy for finite horizon, which we compare with the 
offline policy in Section IVllI 

To further simplify the problem, we additionally consider 
the infinite horizon problem where the optimal actions are 
time-invariant. We formulate this as a discounted dynamic 
program with the Bellman equation 

*}) = max })) 

01 . 02,03 

+ /3E [F({£;f ‘ - T0,({ii;r}) + E,})] , (27) 

where (pj = pj^n for all n and the expectation is over 
Ej, j = 1,2,3. This equation can be solved with value 
iteration Il39l . Namely, starting from arbitrary initial actions, 
all actions (f>j are updated as the arguments that maximize 
(l27l l. and value functions V{{Ej°'*}) are updated as in dZTl l. 
until all actions converge to some </>*. Here, the discount factor 
l3 < 1 ensures that the values V ({£’}“*}) remain bounded 1^ . 
The resulting actions yield an online policy that is optimal 
under the action restrictions in (l26T l and infinite transmission 
assumption. Hence, we refer to this policy as the optimal 
online policy for an infinite horizon. 

Remark 2: Each value iteration step requires value 
updates, where K is the number of transmit power values 
after discretization. Hence, the running time of the finite 
horizon algorithm is 0{NK^) for N epochs, and storing the 
optimal policy requires 0{NK^) space. On the other hand, the 
optimal policy is time-invariant for the infinite horizon case, 
and requires only 0{K^) space. 

VII. Numerical Results 

In this section, we demonstrate the optimal policies for 
the two-way relay channel and compare the performance 
of the schemes in Section III-AI and Section |III] in the EH 
setting. In simulations, energy arrivals to node Tj are generated 
independently from a uniform distribution over [0,Ehj] for 
j = 1, 2, 3, with unit epoch lengths = 1 s. The noise density 
is 10“^® W/Hz at all nodes and the bandwidth is 1 MHz. 

Examples for the optimal transmit power policies found 
using the algorithm described in Section [V] are shown in 
Eigures r7l-[T0lfor decode-and-forward relaying. In each figure, 
cumulative energy consumed by the nodes for transmission are 
plotted, the derivative of which yields the average transmit 
powers of the nodes in each epoch. In the figures, T 1 &T 2 
stands for the total cumulative energy of the nodes Ti and 
T 2 , and MAC fraction represents the fraction of the multiple 
access phase, i.e., A„. We remark that concavified sum-rate 
functions are used for the simulations, and average transmit 
powers are shown in the plots for clarity. Pairs of staircases, 
shown in red and green, represent energy causality and battery 




Fig. 7. Optimal cumulative harvested energy and consumed energy policies 
for (a) node Ti and sum of Ti and T 2 , and (b) node T 3 , for an asymmetric 
full-duplex channel with decode-and-forwai'd relaying, his = —110 dB, 
h 23 = —116 dB, peak energy harvesting rates 1 = E^. 2 = 50 mJ 
and Efi 3 = 10 mJ, battery sizes Ei^rnax = E 2 ,max = 50 mJ and 
E^^rnax — 10 mJ. 


capacity constraints on the cumulative power, which is referred 
to as the feasible energy tunnel a. A feasible policy remains 
between these two constraints throughout the transmission 
period. Eigures |7] and are plotted for full-duplex nodes 
while Eigures l9l and [TOl are plotted for half-duplex nodes. Both 
scenarios are considered for an asymmetric EH-TWRC with 
^13 /i 23 in Eigures |7] and |9l and for a symmetric EH-TWRC 
with /ii 3 = /i 23 in Eigures [8] and [10] 

We remark that unlike previous work with simpler channel 
models, e.g., a, a, 03 , Ea, the optimal cumulative energy 
or sum-power policy is not necessarily the shortest path that 
traverses the feasible tunnel. Eigure |7] shows a setting with 
Eh,i = Eh 2 = 50 mJ and E^^a = 10 mJ, i.e., the relay 
is energy deprived compared to Ti and T 2 . Hence, energy 
efficiency is critical for the relay while this is not necessarily 
the case for the remaining nodes that are relatively energy-rich. 
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Fig. 8. Optimal cumulative harvested energy and consumed energy policies 
for (a) node Ti and sum of Ti and T2, and (b) node T3, for a symmetric 
full-duplex channel with decode-and-forward relaying and hi^ = /i 23 = 
— 110 dB, peak energy harvesting rates 1 = E^ 2 = 3 = 50 mJ, 

battery sizes Ei^uiax — E 2 ,max — E^ jyiax — 50 mJ. 


This results in the optimal policy being largely dictated by the 
relay. Note that in Figure |7] the relay follows a cumulative 
energy that resembles the shortest path through the feasible 
energy tunnel, while for Ti and T 2 this is not the case. In 
contrast, in Figure | 8 ] the multiple access phase is more likely 
to be limiting because the sum-rate with equal transmit powers 
at all nodes is limited by the sum-rate constraint of the multiple 
access phase, see (1^ . Thus, the total cumulative energy, 
denoted with T 1 &T 2 in Figure a), follows the shortest path 
within the tunnel, similar to the optimal policy for the multiple 
access channel in ini. However, broadcast powers do not 
yield binding constraints, implying that contrary to the energy 
harvesting models previously studied, e.g., 0 , 0 , the optimal 
policy for the EH-TWRC is not necessarily unique. 

Comparing Figures |9] and [TO] which show optimal poli¬ 
cies for the half-duplex model, we observe that the time 
division parameters A„ play an important role in helping 
energy deprived nodes. By properly selecting A„, the effect 




Time (s) 


(b) 

Fig. 9. Optimal cumulative harvested energy and consumed energy policies 
for (a) node Ti and sum of T\ and T2, and (b) node T3, for an asymmetric 
half-duplex channel with decode-and-forward relaying and hia = —110 dB, 
/123 = —116 dB, peak energy harvesting rates Ef^ i = Ef^ 2 = 50 ml, 
®h,3 = 10 ml and battery sizes Ei^max = E 2 ,max = 50 ml and 
Ez,max — 10 ml. 


of unbalanced energy harvests at the sources and the relay can 
be mitigated. However, this still does not imply the shortest 
path is optimal for each node. This is due to the interplay of 
transmit powers though the joint rate function in the objective. 
Hence, whenever the transmit power changes for one user due 
to a full battery or an empty battery, the transmit powers of 
other users are affected as well. Examples to this phenomena 
can be found in Eigure |9] at f = 3,4 s, where the energy 
depletion in T 3 is observed to affect the transmit powers of 
Ti and T 2 , and in Eigure (TO] at f = 2 s, where the energy 
depletion in Ti and T 2 is observed to affect the transmit power 
of T 3 . 

Remark 3: Similar results were observed for compress-and- 
forward, compute-and-forward, and amplify-and-forward re¬ 
laying through simulations. We observed that identical energy 
harvesting profiles and channel parameters yield transmit pow- 
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Time (s) 


(b) 

Fig. 10. Optimal cumulative harvested energy and consumed energy policies 
for (a) node Ti and sum of Ti and T 2 , and (b) node Ts, for a symmetric 
half-duplex channel with decode-and-forward relaying and hia = ^23 = 
— 110 dB, peak energy harvesting rates Ef^ i = E/i _2 = ®/i ,3 = 50 tnj, 
battery sizes ft'i.Tnax — E 2 ^max — E^^max — 50 mJ. 


ers that only differ slightly among relaying schemes. However, 
the multiple access phase fractions, A„,n = 
differ notably among relaying schemes in order to achieve 
matching multiple access and broadcast rates within each 
epoch. Due to the similarity of the transmit power policies, 
to avoid repetition, we omit the plots for these schemes. 

Next, we compare the performance of the optimal offline 
and online policies with upper and lower bounds for a decode- 
and-forward relay. We obtain a non-energy-harvesting upper 
bound by providing the total energy harvested by each node at 
the beginning of the transmission without a battery restriction. 
We also present two naive transmit power policies, namely 
the hasty policy and the constant power policy, as lower 
bounds. The former policy, also referred to as the spend- 
as-you-get algorithm ESI, consumes all harvested energy 
immediately within the same epoch. The latter policy chooses 
the average harvest rate at each node as the desired transmit 



Fig. 11. Sum-throughput with optimal power allocations for decode-and- 
forward relaying compared with a non-EH upper bound, hasty policy and 
constant power policy. 

power, and transmits with this power whenever possible. For 
both naive policies, the phase fraction parameters A„ that 
maximize the instantaneous sum-rate for the given transmit 
powers are chosen within each epoch. We consider a half¬ 
duplex EH-TWRC with /113 = /123 = — 110 dB, and choose 
the energy harvests for node Tj to be independent and uni¬ 
formly distributed over [0,Ehj] where Eh ^2 = 50 mJ and 
Eh ,3 = 20 mJ are the peak harvest rates. The infinite horizon 
online policy is found using a discount factor of f3 = 0.999. 
The sum-throughput values resulting from these policies with 
a half-duplex relay in TV = 10 epochs, averaged over 100 
independently generated scenarios, are plotted in Figure [TT] In 
the figure, the peak harvest rate for node Ti, Eh,i, is varied in 
order to evaluate the performance of the policies at different 
harvesting rate scenarios. We observe that the optimal online 
policy, found for a horizon of iV = 10 epochs, as well as 
its infinite horizon counterpart perform notably better than the 
naive policies. 

Finally, we compare the sum-throughput resulting from 
decode-and-forward, compute-and-forward, compress-and- 
forward, amplify-and-forward, and hybrid strategies in an 
EH-TWRC. The same parameters as in Figure [TT] are used 
in simulations. The sum-throughput values obtained over a 
duration of TV = 10 epochs are plotted in Figure [T2| We 
observe that for low and high transmit powers, either decode- 
and-forward or compute-and-forward outperforms the other, 
respectively, while they both exceed the sum-throughput values 
of compress-and-forward and amplify-and-forward relaying. 
However, as expected, the hybrid strategy outperforms all 
single-strategy approaches, since it performs at least as good 
as the best one in each epoch. 

VIII. Conclusion 

In this paper, we considered the sum-throughput maximiza¬ 
tion problem in a two-way relay channel where all nodes 
are energy harvesting with limited battery storage, i.e., finite 
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Fig. 12. Sum-throughput with various relaying strategies against peak har¬ 
vest rates for node Ti. The compress-and-forward and amplify-and-fomard 
strategies and omitted since they perform notably worse than those in the plot. 

battery. We considered decode-and-forward, compress-and- 
forward, compute-and-forward and amplify-and-forward relay¬ 
ing strategies with full-duplex and half-duplex radios. Notic¬ 
ing that the best relaying strategy depends on instantaneous 
transmit powers, we proposed a hybrid relaying scheme that 
switches between relaying strategies based on instantaneous 
transmit powers. We solved the sum-throughput maximization 
problem for the EH-TWRC using an iterative generalized 
directional water-filling algorithm. For cases where offline 
information about energy harvests is not available, we formu¬ 
lated dynamic programs which yield optimal online transmit 
power policies. Simulation results conhrmed the benefit of the 
hybrid strategy over individual relaying strategies, and the im¬ 
provement in sum-throughput with optimal power policies over 
naive power policies. The online policies found via dynamic 
programming also proved to perform better than their naive 
alternatives. It was observed that in a two-way channel with 
energy harvesting nodes, either of the communication phases, 
i.e., broadcast or multiple access phases, can be limiting, 
impacting the optimal transmit powers in the non-limiting 
phase as well. Thus, the jointly optimal policies were observed 
not to be the throughput maximizers for each individual node, 
or the sum-throughput maximizers for a subset of nodes - a 
fundamental departure in the structure of optimal policies in 
previous work |l4l-||6l, ifTTl . ifTSll . lfT4ll . 

We remark that the offline throughput maximization prob¬ 
lem for the full-duplex and half-duplex cases when decode- 
and-forward relaying is used can also be solved using the 
subgradient descent algorithm as shown in BTIl . Future direc¬ 
tions for this channel model include optimal offline and online 
power policies for more involved models with data arrivals 
at the sources, data buffers at the relay, or a direct channel 
between sources. 
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